On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation
نویسندگان
چکیده
In this paper we present experiments concerned with automatically learning bilingual resources for machine translation: bilingual dictionaries and transfer rules. The experiments were carried out with Brazilian Portuguese (pt), English (en) and Spanish (es) texts in two parallel corpora: pt–en and pt–es. They were designed to investigate the relevance of two factors in the induction process, namely: (1) the coverage of linguistic resources used when preprocessing the training corpora and (2) the maximum length threshold (for transfer rules) used in the induction process. From these experiments, it is possible to conclude that both factors have an influence in the automatic learning of bilingual resources.
منابع مشابه
Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملLearning Translation Templates From Bilingual Text
This paper proposes a two-phase example-based machine translation methodology which develops translation templates from examples and then translates using template matching. This method improves translation quality and facilitates customization of machine translation systems. This paper focuses on the automatic learning of translation templates. A translation template is a bilingual pair of sen...
متن کاملMachine Learning-based Sentiment Analysis of Automatic Indonesian Translations of English Movie Reviews
Sentiment analysis is the automatic classification of the overall opinion conveyed by a text towards its subject matter. This paper discusses an experiment in the sentiment analysis of of a collection of movie reviews that have been automatically translated to Indonesian. Following [1], we employ three well known classification techniques: naive bayes, maximum entropy, and support vector machin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008